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ABSTRACT 

We present 663 QSO candidates in the Large Magellanic Cloud (LMC) selected using multiple di- 
agnostics. We started with a set of 2,566 QSO candidates selected using the methodology presented 
in our previous work based on time variability of the MACHO LMC lightcurves. We then obtained 
additional information for the candidates by crossmatching them with the Spitzer SAGE, the 2MASS, 
the Chandra, the XMM, and an LMC UBVI catalog. Using this information, we specified six diag- 
nostic features based on mid-IR colors, photometric redshifts using SED template fitting, and X-ray 
luminosities in order to further discriminate high confidence QSO candidates in the absence of spectra 
information. We then trained a one-class SVM (Support Vector Machine) model using the diagnostics 
features of the confirmed 58 MACHO QSOs. We applied the trained model to the original candidates 
and finally selected 663 high confidence QSO candidates. Furthermore, we crossmatched these 663 
QSO candidates with the newly confirmed 151 QSOs and 275 non-QSOs in the LMC fields. On the 
basis of the counterpart analysis, we found that the false positive rate is less than 1%. 
Subject headings: Magellanic Clouds - methods: data analysis - quasars: general 



1. INTRODUCTION 

Active Galactic Nuclei (AGNs) are very energetic 
extragalactic objects that have been studied in many 
astronomical field s such as gal a xy fo r mation and 
evolu t ion (e.g. Heckman et al.l l2004t iBower et al.l 
120061: 'Trichas et al.' '20091 |2010[) . large scale structure 
(e.g . Ross ct al. 2009), dark matter substructure (e.g. 
Miranda & Maccio 2007), and black hole growth (e.g. 
KoUmeier et al...2006.) . 



It is known that QSOs show strong variability over 
wide range of wavelengths on a time scale from a few 
days to several years (Hook et al. 1994; Hawkins 2002). 
It is widely believed that the variability is associated with 
accre tion disk instability (|Reesi T984j : iKawaguchi et all 
119981 ). Recently, inter esting studies on QSO variability 
have been published (jKellv et al.l 120091 : IMacLeod et al.l 
I2010f ). which confirmed a correlation between the time 
scale of QSO variability and the physical parameters of 
QSOs such as black hole mass. Although these stud- 
ies confirmed the correlation, different studies showed 
a discrepancy at the time scales of QSO variability 
(IKellv et al.|[2009l: iKoSTowski et al.|[2010l: (MacLeod et al.l 
[2010^. Possible reasons for the discrepancy are 1) poorly- 
sampled lightcurves and/or short observational periods, 
2) false positives such as stellar contaminations in their 
QSO candidates, and 3) biased QSO samples in luminos- 
ity or black hole mass. Thus having a well-sampled set of 
QSO lightcurves with a long baseline and small number 
of false positives is critical for the comprehensive analysis 
of this correlation. Note that there are only a few hun- 
dreds well-sampled QSO lightcurves, and a large portion 
of them are around the LMC fields where the MACHO 



surve y monitored for s everal ve ars (e.g. see l Geha et ahl 
200i IKellv eraL|[2009t [Kozlows kTet al.ll2011[ ). 

The MACHO survey observed the sky around the 
LMC for 7.4 years with relatively regular sampling of 
a few days. The majority of the MACHO lightcurves 
have more than several hundred data points and there- 
fore the MACHO lightcurves are suitable for the QSO 
variability studies. Nevertheless, there are only 59 con- 
firmed MACHO QSOs in the 40 deg^ areas around the 
LMC (G eha et al.ll2003f ). The main reasons for the rela- 
tively small number of QSOs are 1) the crowdedness of 
the fields, which makes it difficult to select QSO candi- 
dates among the dense stellar sourc es and thus yields 
a high false positive rate (e.g. see iGeha et ahl 120031 : 
iDobrzvcki et al.ll2005f ). and 2) the high cost of spectro- 
scopic or X-ray observations, which are the best methods 
for confirming QSOs. Thus a novel QSO selection algo- 
rithm with a high efficiency and a low false positive rate 
is essential to make the best use of the expensive spec- 
troscopic telescope time and increase the collection of 
QSOs. 

In our previous work ()Kim et al.ll2011[ ). we developed 
a QSO selection method using a supervised classification 
model trained on a set of variability features extracted 
from the MACHO lightcurves including a variety of vari- 
able stars, non-variable stars and QSOs. The trained 
model showed high efficiency of 80% and low false pos- 
itive rate of 25%. Using this method, we first selected 
2,566 QSO candidates from the lightcurve database. We 
then developed and employed a decision procedure on 
the basis of diagnostics using 1) mid-IR colors, 2) pho- 
tometric redshifts, and 3) X-ray luminosities on these 
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candidates in order to separate high confidence QSO can- 
didates (hereinafter hc-QSOs). As a result, we chose in 
total 663 hc-QSOs out of 2,566. These 663 candidates are 
likely QSOs; if confirmed this will increase the previous 
collection of QSOs in the MACHO LMC database by a 
factor of ^12. Note that most of the hc-QSO lightcurves 
are well-sampled for 7.4 years (i.e. several hundreds data 
points with relatively regular sampling), and are chosen 
in such a way to exclude any potential false positives. 
Therefore the lightcurve collection of hc-QSOs is a valu- 
able set for QSO variability studies and can be used as 
a target set for spectroscopic observations. 

In Section [21 we briefly introduce the MACHO 
database and the QSO selection algorithm that we de- 
veloped to select the initial set of QSO candidates. We 
then present multiple diagnostics that we applied on the 
set of QSO candidates in Section [31 Section [31 presents 
a classification model trained on the diagnostics features 
in order to choose hc-QSOs. In Section[Sl we crossmatch 
our candidates with newly discovered QSOs in the LMC 
fields. A summary is given in Section [6l 

2. QSO CANDIDATES IN THE MACHO LMC DATABASE 

We first selected QSO candidates from the MACHO 
lightcurve database u s ing th e QSO selection method de- 
veloped by [Ki^^tlan (HqIII) (hereinafter, K- method). In 
this paper we used a 10% QSO probability product cut to 
select the QSO c andidates rather than a 25% cut which 
iKim et aTl (|2011[ ) used because we will employ other di- 
agnostics (see Section [3]) that are able to effectively re- 
move false positivesI3 Here probability product is the 
product of the probabilities derived independently from 
MACHO B a nd R band lightcu rves using Support Vec- 
tor Machine fBoscr ct al.l 11993) and Piatt's probability 
estimation (Piatt 1999). By definition, QSO candidates 
with higher probabilities are more likely to be QSOs. 
With the probability cut of 10%, we found 2,566 QSO 
candidates. 

3. DIAGNOSTICS OF THE QSO CANDIDATES 

In the following subsections, we will introduce the di- 
agnostics performed and the consequent results. 

3.1. Spitzer mid-IR Properties 

It is known that mid-IR color selection is an ef- 
ficient discriminator for AGNs and stars/galaxies re- 
sulting from the fact that the spectral energy dis- 
tributions of the se sources are sub s tantially differen t 
from each o ther (ILaurent et al.|[2000l: ILacv et al.ll2004D . 
ILacv et all (j2004D introduced a mid-IR color cut to sep- 
arate AGNs using Spitzer SAG E (Surveying th e Agents 
of a Galaxy's Evolution: ,Meixner et al.l | 2006( ) catalog. 
iKozlowski fc KochaneU ()2009[ ) employed a similar mid- 
IR color cut and selected about 5,000 AGN candidates 
from the Spitzer SAGE catalog. 

We used these mid-IR color selections as the first diag- 
nostic. We crossmatched our candidates with the Spitzer 
SAGE LMC catalog containing 6 million mid-IR objects 
in order to check whether our candidates are inside the 
mid-IR selection cuts. We searched for the nearest SAGE 
source from each candidate within an 1" search radius. 

^ A lower probability cut typically produces not only more QSO 
candidates but also more false positives. 



In order to minimize false crossmatchings, we defined a 
source as a counterpart only if there are no other Spitzer 
sources within a 3" radius from the candidate. 

We found about 700 Spitzer counterparts shown in Fig- 
ure [H (dots). The sources inside region B could either be 
AGNs or stars, while the sources inside region A are likely 
AGNs. The YSO region is thought to be dominated by 
Young Stellar Objects (YSOs) while the QSO region is 
thought to be dominated by AGNs. Nevertheless, all the 
sources inside these four regions are potential QS0sl3 Al- 
most all of the confirmed MACHO QSOs are inside these 
four regions as shown in Figure n (boxes) 13 The candi- 
dates inside these regions are most likely broad em is- 
sion line QSOs (i.e. Type I AGNs (IStern et al.l 120051) ). 
Among these counterparts, the sources inside both the 
QSO and the A regions are likely to be QSOs. We found 
that 469 QSO candidates are inside both QSO and A 
regions. 

Figure ^ shows the estimated K-method QSO proba- 
bility products of these 469 candidates. As the histogram 
shows, there are more QSO candidates at higher proba- 
bility than lower probability, which implies that the mid- 
IR diagnostic is in line with the K-method0 In addition, 
the histogram shows a bimodal distribution of the prob- 
abilities. We will address this bimodality in the following 
section. 

3.2. Photometric Redshift Using Template Fitting 

We first crossmatched the 2,566 QSO candidates with 
the UBVI catalog for the LMC (IZaritskv et all l200l 
and the 2MASS catalog (jSkrutskie et al.ll2006[) toextract 
UBVI and JHK magnitudes. We searched the nearest 
source from each of the candidates within a 3" search 
radius. In the case of the UBVI catalog, we found in to- 
tal 2,375 counterparts. Among them, 84% (93%) UBVI 
counterparts are within a 1" (1.5") distance from the 
candidates. In addition, only 0.3% (2% or 17%) of the 
candidates have another counterpart within an 1" (1.5" 
or 3") distance from the candidates. Thus the portion of 
the false crossmatching is not significant. In the case of 
the 2MASS catalog, we found in total 846 counterparts. 
From those, 74% (83%) are within a 1" (1.5") distance 
from the candidates while 0% (0.1% or 0.5%) of the can- 
didates have another counterpart within a 1" (1.5" or 3") 
distance from the candidates. Again the portion of the 
false crossmatching is negligible. 

We then separated stars from 'Galaxies and AGNs' 
(i.e . extragalatic sources) usin g a criterion proposed 
bv lEisenhardt et aTj (|2004l ) and I Rowan- Robinson et afl 
()2005f ). Figure 131 shows the criterion (the solid line) 
we applied. There were 686 extragalatic sources (above 
the cut) and 1274 stars (below the cut)0 These 
686 extragalatic sources were then fitted with galaxy 
templates in order to der ive photometric redshifts 

owan- Robinson et al.ll2008l) . The templates contained 
three QSO, one starburst and 10 galaxy templates. For 

^ The strongest statement is that QSOs are very unlikely to be 
outside those four regions. 

^ There are 48 MACHO QSOs that were crossmatched with the 
SAGE catalog. 

* In the case of the entire 2,566 QSO candidates, the number of 
candidates decreases at higher probability. 

^ We excluded the sources that do not have enough color infor- 
mation. 
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Fig. 1. — Mid-IR color-color and color-magnitude diagrams of the Spitzer SAGE counterparts with our QSO candidates (dots). Each 
axis of the figure is ei ther Spitzer magnitude or color. All sources inside the four regions A, B, QSO and YSO are potential QSOs 
IjKozlowski &: Kochanebi2009il . There are 469 candidates inside the both QSO and A regions, which are the most promising QSO candidates. 
The confirmed MACHO QSOs are also inside these four regions (boxes). 
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Fig. 2. — Histogram of K-method QSO probabilities of the SAGE 
counterparts inside both the QSO and the A (see Figure[T]l. There 
are more high probability candidates than low probability candi- 
dates, which indicates that the candidates inside the QSO and the 
A are likely to be QSOs. The histo gram also shows a bimodal 
distribution as is addressed in Section l3.2l 

details about the photometric r edshift estimations and 
the S EP template fitting, see I Rowan- Robinson et al.l 
l|2008l) . 

Among the extragalatic sources, 602 were fitted with 
AGN templates (i.e. QSOs) while the remaining 84 were 
fitted with the galaxy templates (i.e. galaxies). These 
602 candidates are likely QSOs. Figure |3] shows the pho- 
tometric redshifts of these QSOs and galaxies. As the 
figure shows, the QSOs (the top panel) have relatively 
higher redshifts than the galaxies (the bottom panel). 
QSOs are much more luminous than galaxies and thus 
are detectable at higher redshifts than galaxies. In Fig- 
ure [51 we show the comparison between the photometric 
redshifts and the spectroscopic red shifts of the confirmed 
MACHO QSOs (IGeha et al.l 120031) . Out of the 58 con- 
firmed MACHO QSOfl 40 are fitted with the photomet- 

6 Note that 58 of 59 MACHO QSOs had been monitored more 
than several hundreds times during 7.4 years' observation while the 
remaining one MACHO QSO has only about 50 data points. We 



Fig. 3. — Criterion (the solid line) to separate ex tragalatic sources 
('Galaxies and AGN' in the figure) from stars lEisenhar dt et aLl 
[2004; Rowan- Robinson et al. 2005) Using the criterion, 686 candi- 
dates were classified as extragalatic sources (above the line) and 
1274 candidates were classified as stars (below the line). 

ric redshift code. The remaining 18 were not fitted due to 
the lack of data (i.e. UBVI magnitudes). Among these 
40 confirmed MACHO QSOs, only one was best fitted 
with galaxy templates while the other 39 were fitted with 
AGN templates. The QSO best fitted with the galaxy 
te mplates is confirni e d to b e a QSO from the w orks done 
by 'Schmidtkc e t al.l ([TOOl ): IGeha et all ([200l . Out of 
the 40 QSOs, 28 (70%) are inside the ±0.1 dex accuracy 
(the dashed line in the figure). 

Figure [6| shows the K-method probability of QSOs, 
galaxies and stars discriminated during the photomet- 
ric redshift estimation. As the figure shows, the ma- 
jority of QSOs have higher probabilities than galaxies 
and stars, which implies that galaxies and stars have dif- 
ferent and most likely weaker variability characteristics 
from/than QSOs. Note that the probabilities are from 
the K-method which mainly used variability features of 

excluded the QSO with 50 data points from the analysis in this 
paper. 
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Fig. 4. — Photometric redshifts of the 602 QSO candidates fit- 
ted with the AGN templates (the top panel) and the 84 QSO 
candidates fitted with the galaxy templates (the bottom panel) 
Plowan- Robinson et al. 2008). The 602 QSO candidates show rel- 
atively larger redshifts than the 84 candidates. 
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Fig. 5. — Co mparison between the spectroscopic redshifts 
IIGeha et al.l l2003l and the photometric redshifts for the confirmed 
MACHO QSOs. Seventy percent of estimated redshifts are well- 
matched with the spectroscopic redshifts (see the dashed line cor- 
responding to ±0.1 dex accuracy). There is one MACHO QSO 
(triangle) that is fitted with the galaxy templates and 39 MA- 
CHO QSOs (sq uares) that are fitted with the AGN templates 
JRowan- Robinson et al.1120081 '1. 

liglitcurvcs to select QSO candidates. 

The left panel of Figure [5] also shows similar bimodality 
as seen in the Figure [2] In order to check if there exists 1) 
different variability characteristics between QSOs, galax- 
ies and stars, and 2) different variability characteristics 
between the high and low probability QSO candidates, 
we show histogra ms of two variability features defined in 
iKim et all ()2011l) in Figure [3 The left 2x2 sub-panels 
(left side A, B, C and D) shows the histogram of cr/m, 
where a is the standard deviation and m is the mean 



magnitude. In general a/m is large when a lightcurve 
has strong variability. The x-axis is scaled to be between 
and 1. To check if differences exist between high and 
low probability QSOs (A and B), we selected two subsets: 
one of high (>80%) and the other of low (<40%) prob- 
ability QSOs. We included all galaxies (C) and stars 
(D) regardless of their probabilities. As the left pan- 
els show, galaxies and stars show different distributions 
from the distribution of QSOs that has a peak around 
~0.3. Nevertheless, high and low probability QSOs do 
not show different distribution. The right 2x2 sub- 
panels (right side A, B, C and D) show a different time 
variability index, Stetson Kac^ which is the observation 
of the distribution of data points between the maximum 
and minimum values of the autocorrelation function of a 
lightcurve (Ki m et all 1201 It ). As the panels show, high 
probability QSOs (A) show a peak around 0.6 while low 
probability QSOs (B) show a peak around 0.4. Galaxies 
(C) and stars (D) show peaks around 0.7. Thus it seems 
that the bimodality shown in the left panel of Figure 
[6] and the different distributions between QSOs, galaxies 
and stars in Figure [6] is correlated with the different vari- 
ability characteristics of the lightcurves. Further analysis 
of this bimodality, requiring careful investigation of many 
variability characteristics and understanding of the selec- 
tion biases is beyond the scope of this paper. 

In addition. Figure [8] shows the mid-IR colors of QSOs, 
galaxies and stars. As the figure shows, almost all of the 
QSOs (dots) are inside the four regions while most of 
the stars (triangles) are outside the regions. Galaxies 
(squares) are either inside or outside the regions. 

3.3. X-ray Luminosity 

In order to estimate the X-ray luminosity, we cross- 
matched the 2,566 QSO candidates with two X-ray 
point source catalogs: the Chandra X-ray source catalog 
(jEvans et aLll2010[ ) and the XMM-New ton 2"'' Incremen- 
tal Source catalog (jWatson et al.ll2009l) . We searched for 
the nearest source within a 5" search radius from each 
candidate. The majority of the crossmatched counter- 
parts were within a 3" distance from the candidates and 
there were no additional counterparts within a 5" dis- 
tance from the candidates. We found 88 counterparts 
from either the XMM or Chandra catalogs. 

Amongst the 88 counterparts, 64 were fitted with the 
SED templates mentioned in section 13.21 and therefore 
had estimated photometric redshifts. We used the pho- 
tometric redshifts and X-ray fiuxes from the catalogs 
to calculate the X-ray luminosity of each counterpart. 
Figure IH] shows the photometric redshifts (x-axis) and 
the estimated X-ray luminosity, logLx (y-axis). In the 
left panel, we show the 61 XMM counterparts includ- 
ing eight confirmed MACHO QSOs. The right panel 
shows 14 Chandra counterparts including three con- 
firmed MACHO QSOs. Almost all of the candidates (60) 
have higher logLx than 42. In addition, six confirmed 
MACHO QSOs and 26 candidates show logLx higher 
than 44. The candidates showing high er logLx than 44 
(42) are likely to be QSOs (AGNs) (|Elvis et all [1991 
iPersic et al.ll200l . The remaining candidates that show 
lower logLx than 42 are likely to be galaxies. 

We show the mid-IR colors of these X-ray counterparts 
in Figure [TOl The classification of QSOs (dots), AGNs 
(x's) and galaxies (squares) are based on the X-ray lu- 
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Fig. 6. — left: Histogram of the estimated K-method QSO probabilities for 602 QSOs fitted with the AGN templates. The histogram shows 
a bimodal distribution similar to the histogram shown in Figure [2] The bimodality is correlated with different variability characteristics 
of the low and high probability QSO candidates. See the text and Figure [7] for details, right: Histogram of the estimated K-method QSO 
pro bability of 84 gala xies (the top pane l) and 1274 stars (the bottom panel) separated using a approach proposed by Ei senhardt et aTl 
J2004) and iRowan- Robinson et al.l II2005I ). As the histogram clearly shows, they have relatively lower probabilities than QSOs. 
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Fig. 7. — left side A, B, C and D: Histogram of one of the time series features, a/fn l|Kim et al.ll20lj] ). Galaxies and stars show different 
distribution from both high and low probabili ty QSOs while hi gh and low probability QSOs do not show distinctive differences, right side 
A, B, C and D: Histogram of Stetson K^q l IKim et al.l 1201 J) . High probability QSOs show different distribution from low probability 
QSOs while galaxies and stars show almost identical distributions. As the histograms show, it seems that the bimodality in the left panel 
of Figure [6] is correlated with the different variability characteristics of each class. Further analysis of this bimodality is beyond the scope 
of this paper. 



minosity of the counterparts. 

4. HIGH CONFIDENCE QSO CANDIDATE SELECTION 
USING SUPPORT VECTOR MACHINES 

4.1. Support Vector Machine 

SVM (Support Vector Machine, iBoser et al.]ll992| ) is 
a supervised machine learning algorithm that trains a 
two-class classification model using samples of two known 
classes (i.e. training set). SVM is currently one of the 
best classification methods in machine learning. The 
classifier of a SVM defines a linear hyperplane that sepa- 
rates two classes in a training data. To select a unique hy- 
perplane among the set of possible hyperplanes that sep- 
arate the data, SVM chooses the hyperplane which max- 
imizes the margin between the two classes, and is there- 
fore often called the maximum margin separator. SVM 
is also able to separate non-linearly separable classes by 
using a kernel function (e.g. a polynomial kernel or a ra- 



dial basis kernel) transforming non-linear feature spaces 
into linear feature spaces. The hypothesis of SVM has 
the form: 

Class{z) = sign(y^ aiyiK(z, Xi) - b) (1) 

i 

where i are the indices for training set examples, 
the examples, t/i are the labels, z is the example that we 
are predicting the label for, K{z, Xi) is a kernel function, 
and 6 is a threshold. The are the parameters learned 
by the training procedure. Despite the mapping to a 
potentially high dimensional space using a kernel func- 
tion, the maximum margin criterion leads to automatic 
capacity control and thus avoids overfitting. 

Compared to neural networks, SVMs provide a 
fiexible classification model, avoids the problems 
of local minima, and reduces the need for pa- 
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Fig. 8. — Mid- IR color-color and color-magnitude diagrams of the QSOs, galaxies and stars classified using the photometric redshift code. 
See Section 13.21 for details. Each axis of the figure is either Spitzer magnitude or color. In the left panel, there are 502 QSOs (dots), 33 
galaxies (squares) and 145 stars (triangles). In the right panel, there are 518 QSOs, 34 galaxies and 145 stars. As the figures show, almost 
all of the QSOs and galaxies are inside the regions (QSO, YSO, A and B), which indicates that all of them are potential QSOs. On the 
other hand, the majority of the stars are outside the regions. 
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Fig. 9. — Scatter plot of the photometric redshifts (x-axis) and the estimated X-ray luminosity, logLx, (y-axis). The dots are our QSO 
candidates and the x's are the confirmed MACHO QSOs. left: XMM counterparts, right: Chandra counterparts. As the figures show, 
most of our candidates and MACHO QSOs have logLx > 42, which indicates they are likely QSOs. 



ramctcr tuning. For an overview, discussion and 
practical detail s, see C ristiani ni fc Shawe - Taylor 
(I2OO00: IBennett fc Campbelll (pol : IHsu et al.l tOOSl : 
iKim et al.l ( 20111) and references therein. Because 
standard SVM can only solve a two-class problem, 
iScholkopf et al.l ()2001[ ) proposed a method to solve one- 
class classification problems using SVM. In brief, they 
define the origin as the second class and separate the one 
class from t he origin using SVM. For details about the 
method, see IScholkopf etal] (|200l : iMa^evitz fc Yousej 
((200l . 



4.2. Training a one-class SVM to Select High 
Confidence QSO Candidates 

We employed the one-class SVM classification method 
to select high confidence QSO candidates because we do 
not have negative examples (i.e. non-QSO training set). 
We used a linear kernel rather than a polynomial kernel 
or a radial basis kernel because we empirically found that 
using other kernels did not improve classification results. 



To train a model, we first defined the diagnostics results 
as feature vectors. Table [T] summarizes the feature vec- 
tors. When we could not determine a feature value due to 
the nonexistence of counterpart with either the Spitzer 
SAGE, UBVI and X-ray catalogs, we assigned zero to 
the corresponding feature. Figure [Til outlines the calcu- 
lation of the diagnostics and the number of candidates for 
which the diagnostics are available. As mentioned above, 
we started with the 2,566 QSO candidates selected using 
the K-method ('Data Preparation' panel in the figure). 
The diagnostics applied to these candidates are shown 
in the 'High Confidence QSO Selection' panel. We also 
show the number of QSO candidates after the diagnostics 
(double- lined rectangles). 

We trained a one-class SVM model using these fea- 
turesQ We then tuned the model by adjusting the thresh- 
old, b, in order to: 1) obtain the highest efficiency based 
on the confirmed 58 MACHO QSOs, and 2) minimize 
the number of selected QSO candidates, which reduces 



^ We used the |LIBSVM package] l| 



i| [200lD . 
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Fig. 10. — Mid-IR color-color and color-magnitude diagrams of the QSOs, galaxies and stars classified using the X-ray luminosity. See 
Section l3.3l for details. Each axis of the figure is either Spitzer magnitude or color. As the figures show, almost of the X-ray counterparts 
are w ithin the QSO and the A region. The candidates inside the QSO and the A region are very likely QSOs (.Kozlowski fc Kochane'H 
[20091) . 



the number of false positives as well. Figure [T^] shows 
the efficiency and the number of candidates as a func- 
tion of b. The black square shows the threshold we 
finally adopted. Using the determined threshold, the 
trained model showed 74% efficiency. We applied the 
tuned model to the 2,566 QSO candidates and selected 
663 QSO candidates (i.e. hc-QSOs). 

Table [2] shows a few important parameters for some 
of the QSO candidates. The entire parameters of 
the 2,566 QSO candidates are published in the elec- 
tronic edition of this manuscript. We also pro- 
vide catalogs and lightcurves of all the candidates at 
|http : / / timemachine . iic . harvard, edu /coati/QSOs| 

5. CROSSMATCHING WITH NEWLY DISCOVERED QSOS 
BY KOZLOWSKI (2011) 

Recently, iKozlowski et al.l (|2011l ) selected QSO can- 
didates using mid-IR colors. X-ray emission and/or 
optical variability in the OGLE lightcurve database 
(jUdalski et al.l [20081) . For the variability selection, they 
used the D RW (a Dam ped Rando m Walk) mod el of 
hghtcurves (jKellv et al.l 2009; Kozlo wskTet al.ll2010f l and 
then applied several cuts including magnitude, model fit- 
ting accuracjQ, slope of a structure function, amplitude 
and time scale of lightcurve variations. They then visu- 
ally examined all the lightcurves of the candidates and 
removed about 96% of lightcurves (~23,000) from the 
final list. Most of false positives were the 'ghost' vari- 
able objects caused by photometric defects. They finally 
observed 845 QSO candidates using AAT/AAOmegc| 
and con&med 169 QSOs including 25 previously known 
QSOfH (i-e. 144 newly discovered QSOs) in the four ~3 
deg^ field near the LMC center. They also provided the 
list of remaining 676 objects. Among these 676 objects, 
they confirmed that 275 are non-QSOs, including young 
stellar objects (YSOs), red stars, blue stars. Be stars and 

* The likelihood ratio between the best fitting model and a white 
noise model. 

^ AAT: Anglo-Australian Telescope , AA Omega: the AAT multi- 
purpose fiber-fed spectrograph tShar p et a l. 2006). 

^" 18 of them are on the confirmed MACHO QSO list and seven 
of them are not on the confirmed MACHO QSO list. 



planetary nebulae 

To estimate the efficiency and the false positive rate 
of our selection method, we first crossmatched the 151 
discovered QSO0 and 275 confirmed non-QSOs (i.e. 
false positives) with the entire MACHO LMC lightcurve 
database. We searched the nearest MACHO LMC source 
within a 3" search radius. Out of 151 QSOs and 275 
non-QSOs, 64 and 122 were crossmatched with the MA- 
CHO sources. Note that, only 46 out of 64 were se- 
lected using variability characteri stics in the OGLE-HI 
lightcurves teozlowski et al.ll2011l ). 

Among these 46 QSOs, 20 are in the hc-QSO list (here- 
inafter, c-QSOs) and 26 are not in the hc-QSO list (here- 
inafter, cn-QSOs), which gives us 43% efficiency. It 
is w orth mentioning t hat t he yield of QSO candidates 
from ' Kozlowski et al.l (|2011[ ) selected using only variabil- 
ity based on the DRW model was 7%. 

Despite of the fact that these 46 QSOs were determined 
to be variable objects based on the optical OGLE-HI 
lightcuves, some of them do not show strong variability 
in the MACHO lightcurves because of 1) the difference 
of the limiting magnitudes of the two survey, and 2) the 
photometric uncertainty of the MACHO lightcurves. For 
instance, we found that 11 of cn-QSOs are fainter than 
19 MACHO R magnitude (mji) while only two of c-QSOs 
are fainter than 19 m^j, which is around a limiting mag- 
nitude of MACHO survey (Figure [15]). Thus it is likely 
that the K-method using variability was not able to de- 
tect some of the QSOs due to the large photometric un- 
certainty and thus weak variability. Figure [131 shows the 
histogram of the ratio between the average photomet- 
ric uncertainty and standard deviation (i.e. amplitude), 
cr/e, of the lightcurves of c-QSOs and cn-QSOs. Small 
cr/e means that the photometric uncertainty is relatively 
larger than the amplitude of the lightcurve, which im- 
plies that it is rather hard to detect its variability. As 
the figure shows, c-QSOs have relatively larger cr/e than 
cn-QSOs, which means c-QSOs are more detectable than 
cn-QSOs using their variability, a is one of the time vari- 

The remaining sources had undetermined classification. 
144 newly discovered QSOs and seven previously known QSOs 
that are not on the confirmed MACHO QSO list. 
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TABLE 1 
Feature Vectors 



mid-IR 


extragalactic sources / stars 


SED fitting 




Chandra 


XMM 




no CP^ : 


no CP : 


no CP : 


no CP : 


no CP : 


no CP : 





inside any of the four regions : 1 


stars : 1 


galaxies : f 


value'' 


galaxies : 1 


galaxies 


1 


inside both the QSO and A region : 2 


extragalactic sources : 2 


AGNs : 2 


AGNs : 2 


AGNs : 


2 










QSOs : 3 


QSOs : 


3 



^no counterpart. 

is from the SED fitting. 



7 20 million ^ 
MACHO DB ^ K-method >^ 



^ Data Preparation 



2566 candidates 



High Confidence QSO Selection 



Photometric Redshifts 
Using Template Fitting 



Counterparts 
with UBVI and SAGE? 





Extragalactic sources? 
(by color cut) 

X 

586 candidates 

AGNs? 
(by SED fitting) 

' T 

' 602 candidates 
X2 




Yes 
Data 



Mid-IR Colors 



Counterparts 
with SAGE? 


> 




Insid 
AGN r 


e the 
sgion? 



600 Candidates 

1 

Inside both the QSO and 
the A region? 

1 

469 candidates 



X-Ray Lunninosity 



Counterparts with X- 
ray catalogs? 



64 candidates 



AGN if X-ray 
Lunninosity > 42 and < 44 



26 candidates 



QSO if X-ray 
Luminosity > 44 



I 



34 candidates 



Fig. ft. — Illustration of the processes that we used to select hc-QSOs. The rectangles with bold borderlines are the diagnostics. At 
most of the diagnostics, we determined if the candidates are likely to be QSOs (solid line arrows). The thin arrows show the data flow. 
The double-lined rectangles show the number of candidates. 
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1500 1000 
Number of candidates 




Fig. 12. — Efficiency versus number of selected QSO candidates 
as a function of the SVM threshold, b. The black square shows the 
final threshold we adopted. 



c-QSOs 



cn-QSOs 



4 



TABLE 2 

Several Important Parameters of the QSO candidates 



MACHO ID 


RA 


Dec 


V 


hc-QSO^ 




(in degree) 


(in degree) 


(mag) 


11.8747.1083 


83.52708 


-70.62689 


18.98 


1 


11.8753.346 


83.66207 


-70.20544 


18.72 




11.8984.29 


83.89623 


-70.89459 


18.16 




11.8989.258 


84.04636 


-70.61672 


18.67 




11.8994.1323 


83.91927 


-70.25463 


19.27 


1 


11.9349.1074 


84.53299 


-70.81329 


20.26 


1 


11.9353.1217 


84.52798 


-70.50399 


19.52 




12.10679.528 


86.51372 


-70.85550 


18.99 




13.5834.232 


79.19451 


-71.16704 


19.57 




13.6446.758 


80.00723 


-70.74329 


20.32 




13.6448.3756 


80.03121 


-70.59326 


19.04 




13.6560.555 


80.25070 


-71.21474 


19.68 





Note: This table is published in its entirety in the electronic edition 
of this manuscript. A portion is shown here for guidance regarding 
its form and content. 
''I: high confidence QSO candidate 



ability features that the K-method used. 

In Figure 1141 we show an alternative way of seeing 
variability characteristic of a lightcurve by borrowing 
one e xample of the time series features, Res (EUaw;^ 
Il978f ). used in the K-method. Res, the range of a cu- 
mulative sum, is typically large for the variables showing 
non-periodic and strong variability, and is small for peri- 
odic variables or no n- variables. As the figure shows, the 
histogram of c-QSOs (the top panel) has a peak around 
6 while the histograms of cn-QSOs shows a peak around 
3 (the bottom panel). 

In addition, we show the MACHO lightcurves of the 
20 c-QSOs and 26 cn-QSOs in Figure [Ml and Figure [HI 
As Figure [16] shows, most of the c-QSOs show strong 
variability. On the other hand. Figure [T7| shows that 
most of the cn-QSOs fainter than 19 m^j show relatively 
weaker variability than the variability of c-QSOs. Only 
cn-QSOs brighter than 19 shows strong variability 
comparable to that of c-QSOs. 

According to Figure [131 [HI [HI and [iTl it seems that 
the main reason for the non-detection of QSOs is the 



Fig. 13. — Histogram of the ratio between the photometric uncer- 
tainty and ampUtude, cr/e, of c-QSOs (the top panel) and cn-QSOs 
(the bottom panel). See the text for details about c-QSOs and cn- 
QSOs. Small cr/e means that the photometric uncertainty is too 
large to detect variability. c-QSOs show relatively larger cr/e than 
cn-QSOs, which means that c-QSOs are more detectable than cn- 
QSOs using variability. 




2 4 6 8 10 12 14 









\t^^ cn-QSOs , 









2 4 6 8 10 12 14 



Fig. 14. — Histogram of Res of c-QSOs (the top label) and cn- 
QSOs (the bottom panel). c-QSOs and cn-QSOs show different 
distribution. See the text for details. 

relatively weaker variability. Thus if we ignore some of 
the QSOs showing weak variability, our efficiency would 
be higher than 43%. For instance, if we ignore the 11 
cn-QSOs fainter than 19 mj^, our efhciency increases to 
57%. 

In the case of the false positives, only two out of 122 
confirmed non-QSOs are inside the hc-QSO list, which 
gives 0.3% false positive rate. The two false positives are 
YSOs. We examined the MACHO lightcurves of them 
and confirmed that t h ey sho w strong variability. Note 
that iKozlowski et al.l ()2011l ) monitored 12 deg^ fields 
around the LMC that are mostly inside the 40 deg^ MA- 
CHO LMC fields. Given that our QSO candidates are 
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MACHO luminosity function 




Fig. 15. — Luminosity function of MACHO R magnitude from 
one MACHO field. The x-axis is MACHO R magnitude and the 
y-axis is the number of MACHO sources. As the figure shows, the 
limiting R magnitude is around 19 ~ 19.5. 

uniformly distributed around the LMC, we would have 
about one th ird number of the hc-Q SOs (12/40) inside 
the fields that lKozlowski et al.l (|2011[ ) monitored. In such 
case, the false positive rate is about 1%. However the 
true false positive rate would be higher than 1% because 
IKozlowski et al.l (j2011|) did not monitor all the sources 
in the fields, which means some of our QSO candidates 
are not in their list. Nevertheless, these 122 non-QSOs 
were selected not only by variability but also by mid- 
IR colors and X-ray emission. Thus it seems that our 
method is successful to exclude any type of false pos- 
itives, which is crucial for the selection of QSO candi- 
dates fro m massive ast ronomical databases such as Pan- 
STARRS l|Kaiseil2004D and LSST (llvezic et al.l2008D due 
to: 1) the enormous amount of data, which thus could 
yield huge number of false positives, and 2) the high cost 
of spectroscopic observations for such deep and wide field 
surveys. 

6. SUMMARY 

In this paper, we presented 663 high confidence QSO 
candidates, in the LMC fields. We first selected 2,566 



QSO candidates based on the time variability of MACHO 
B and R band lightcurves in the MACHO LMC ligtcurve 
database using the method of lKim et al.l (|2011f) . We then 
applied multiple diagnostics such as mid-IR color, photo- 
metric redshift and X-ray luminosity to these QSO candi- 
dates. Using the diagnostics outputs, we trained a one- 
class SVM model to discriminate high confidence QSO 
candidates. We finally applied the trained model to the 
original candidates and selected 663 QSO candidates. 

To estimate the yield and false positive rate of the fi- 
nal list, we crossmatched them with re cently confirmed 
QSO s and non-QSOs in the LMC field (jKozIowski et al.l 
1201 If ). As a result, we found that the yield is higher 
than 43%. It is worth mentioning that the yield of 
the QSO candidates selected using the 'dam ped random 
work ' model (Kcllv et al. 2009) is 7% fKoz lowski et al.l 
[20T1 . In the case of the false positive rate, we found 
that there are only a few confirmed non-QSOs in our list, 
which is less than 1% false positive rate. Thus this set 
could be used as a target set potential for spectroscopic 
survey to maximize the yield. This is important because 
the spectroscopic observations for relatively faint objects 
such as the QSO candidates in dense- and wide-field area 
around the LMC is extremely expensive. We are plan- 
ning to use the confirmed QSOs and confirmed non-QSOs 
to improve our QSO selection method. This work will be 
separately published in the near future. 

We will apply our method to the MACHO SMC/bulge 
database and the Pan-STARRS MDF (Medium Deep 
Field) time series database to further select QSO candi- 
dates and thus increase the collection of QSO lightcurves. 
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Fig. 17. — MACHO B band lightcurves of cn-QSOs. When compared to the Ughtcurves shown in Figur e 1161 these hghtcurves show 
relatively weaker variability. Moreover, there are a lot more fainter lightcurves than the lightcurves in Figure 1161 
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